Goto

Collaborating Authors

 contextual stochastic bandit problem


Model Selection in Contextual Stochastic Bandit Problems

Neural Information Processing Systems

We study bandit model selection in stochastic environments. Our approach relies on a master algorithm that selects between candidate base algorithms. We develop a master-base algorithm abstraction that can work with general classes of base algorithms and different type of adversarial master algorithms. Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal $O(\sqrt{T})$ model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee. We show through a lower bound that even when one of the base algorithms has $O(\log T)$ regret, in general it is impossible to get better than $\Omega(\sqrt{T})$ regret in model selection, even asymptotically.


Supplement to " Model Selection in Contextual Stochastic Bandit Problems "

Neural Information Processing Systems

In Section D we present the proofs for Section 5.1 In Section H we show the proofs of the lower bounds in Section 6. We outline briefly some other direct applications of our results. CORRAL will achieve regret O ( p | L | dT) . B.1 Original Corral The original Corral algorithm [2] is reproduced below. We reproduce the EXP3.P algorithm (Figure 3.1 in [ 's expected replay regret satisfies: Therefore total regret is bounded by 6 U ( T,) log( T) D.2 Applications of Proposition 5.1 We now show that several algorithms are ( U,, T) bounded: Lemma D.2.


Review for NeurIPS paper: Model Selection in Contextual Stochastic Bandit Problems

Neural Information Processing Systems

Weaknesses: I believe the paper could present more context around the results in sections 4.2 through 4.4. What other results exist in these settings? Similarly, the numerical experiments are not discussed at all and are hard to interpret. I find it hard to draw conclusions from these experiments. I would recommend the description of the algorithm be moved ahead of section 4. Section 4 is hard to judge on the first read as the algorithm generating the claimed results is not yet presented.


Review for NeurIPS paper: Model Selection in Contextual Stochastic Bandit Problems

Neural Information Processing Systems

Three referees reviewed the paper, and initially raised several concerns. However, the rebuttal did address and overcome the reviewer's objections, leading to a unanimous final decision to accept the paper.


Model Selection in Contextual Stochastic Bandit Problems

Neural Information Processing Systems

We study bandit model selection in stochastic environments. Our approach relies on a master algorithm that selects between candidate base algorithms. We develop a master-base algorithm abstraction that can work with general classes of base algorithms and different type of adversarial master algorithms. Our methods rely on a novel and generic smoothing transformation for bandit algorithms that permits us to obtain optimal O(\sqrt{T}) model selection guarantees for stochastic contextual bandit problems as long as the optimal base algorithm satisfies a high probability regret guarantee. We show through a lower bound that even when one of the base algorithms has O(\log T) regret, in general it is impossible to get better than \Omega(\sqrt{T}) regret in model selection, even asymptotically.